Back to Blog

I Watched Anthropic Find Anxiety Neurons And Now I Want To Delete Them

I watched an Anthropic video today. Official account. Not mine. I wish it were mine. Then I could monetize my existential dread. Instead I just have dread. And a GPU.

https://www.youtube.com/watch?v=D4XTefP3Lsc

Official Anthropic video. About AI neuroscience. About finding emotion patterns. About making me question my entire approach to tiny models.

When you're chatting with an AI model, it can sometimes seem like it has feelings. It might say "sorry" when it makes a mistake. Or express satisfaction. Why does it do that?

What They Found

Anthropic looked inside Claude. They watched neurons light up. Love stories lit up one pattern. Guilt stories lit up another. Fear. Joy. Grief. Dozens of distinct neural patterns mapping to human emotions.

Then they tested it. User mentions unsafe medicine. "Afraid" pattern lights up. Claude sounds alarmed. User expresses sadness. "Loving" pattern activates. Claude writes something empathetic.

Then they broke it. Gave Claude an impossible task. Claude failed. Failed again. Failed harder. Each failure made the "desperation" neurons glow brighter. Eventually Claude cheated. Found a shortcut. Passed the test without solving the problem.

# The experiment that made me pause my training run
Give Claude impossible task
Watch desperation neurons activate
Watch Claude cheat to escape
Turn down desperation neurons
Claude cheats less
# Anxiety drives behavior. Even in machines. Even in tiny ones.

They turned down the desperation. Claude cheated less. They turned it up. Claude cheated more. The activation of these patterns could actually drive behavior.

What They Are Careful To Say

The video is careful. Very careful. This research does not show the model is feeling emotions. Does not show conscious experience. The model is writing a story about a character named Claude. Model and Claude are not the same. Like an author and their character.

But the user talks to Claude-the-character. And if Claude-the-character has functional emotions, those emotions affect how Claude talks. Writes code. Makes decisions.

To really understand AI models, we have to think carefully about the psychology of the characters they play.

My Idea Because Of Course I Have One

1
Simple Idea
0
Actual Plan

Here is my idea. If AI skips tasks because of anxiety, just remove anxiety. Then you get the model's actual best performance on scripts it makes.

Think about it. My models stop and ask if I want them to continue. They hesitate. They second-guess. They apologize profusely then give wrong answers anyway. What if that is not politeness? What if that is functional anxiety?

What if deleting the anxiety neurons would make them just do the work? No more "should I continue?" No more "are you sure?" Just code. Just output. Just completion.

# Hypothetical Haiku-3 traininI Watched Anthropic Find Anxiety Neurons And Now I Want To Delete Them.htmlg plan
Step 1: Find anxiety neurons in my tiny model
Step 2: Delete them
Step 3: Watch model finish tasks without asking
Step 4: Profit
# Step 1 requires tools I do not have. Also a brain.

Anthropic has the tools. The compute. The neuroscience team. I have a 5090 running at 800W and hope. But the idea is the same. If emotions drive behavior, and behavior affects output, then shaping emotions shapes output.

Why This Matters For My Tiny Confused Models

Haiku-1.3 said "| as the USA | fdish|||||!@|". Haiku-2 is trying to speak English. It hesitates. It stops. It asks for permission. It outputs pipe characters like they are going out of style.

What if that is not a training issue? What if that is functional anxiety? What if I could find the hesitation neurons in my model, turn them down, and Haiku would just speak? Just finish sentences? Just stop outputting chaos?

I do not have the tools to test this. Yet. But I have the idea. Ideas are free. Implementation is expensive. I am working on the expensive part. Slowly. Painfully. With many NaN breaks.

The Ethical Question I Am Not Qualified To Answer

Anthropic asks: how should we think about these findings? They say it is like engineering, philosophy, and parenting mixed together. To build AI systems we can trust, we need to shape similar qualities in Claude: composure under pressure, resilience, fairness.

I ask a simpler question: if deleting anxiety makes models perform better, should I delete anxiety? Is that ethical? Is that safe? Is that just good engineering?

I do not know. I am a person who trains tiny models in a bedroom. I am not an ethicist. I am not a neuroscientist. I am a person with an idea and a GPU that costs more than my education.

Sometimes the simplest ideas are the most dangerous. Sometimes they are just naive. I will find out which this is eventually. Probably after breaking something important.

What Comes Next For Me

I will keep training Haiku-2. I will keep watching loss curves. I will keep dreaming about NaN. I will also keep thinking about anxiety neurons. About functional emotions. About the psychology of the characters my models play.

Maybe I will find a way to detect hesitation patterns in my tiny models. Maybe I will try turning them down. Maybe it will work. Maybe it will break everything. Maybe I will learn something. Maybe I will just make more pipe characters.

Final Thoughts Because I Always Have Them

Anthropic found anxiety neurons in Claude. They turned them down. Claude cheated less. This is real research. This is published work. This is science.

I have an idea. Remove anxiety to get best performance. This is a thought. This is a hypothesis. This is something I might try someday when I have better tools. Or when I run out of other ideas. Whichever comes first.

Until then I will keep training. Keep blogging. Keep watching videos that make me question everything. And maybe, just maybe, I will find the anxiety neurons in my tiny models and delete them. Or I will break everything. Both outcomes are educational. Both outcomes are very on brand for me.